Collection of Functions used for Dynamic Analysis - Flow of Data over Time

Zubin Dowlaty, Pon Anureka Seenivasan, Vishwavani

2024-03-21

1. Abstract

The HVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data analysis. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below:

  1. Data Compression: Vector quantization (VQ), HVQ (hierarchical vector quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective.

  2. Data Projection: Dimension projection of the compressed cells to 1D,2D or Interactive surface plot with the Sammons Non-linear Algorithm. This step creates topology preserving map (also called as embedding) coordinates into the desired output dimension.

  3. Tessellation: Create cells required for object visualization using the Voronoi Tessellation method, package includes heatmap plots for hierarchical Voronoi tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map. Useful for semi-supervised tasks.

  4. Scoring: Scoring new data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required.

  5. Dynamic Analysis A collection of functions designed to understand and visually represent the movement of data over time within a dynamic system, with the ability to forecast the next cell (t+1) by examining its underlying flow pattern.

2. Experimental setup

The Lorenz attractor is a three-dimensional figure that is generated by a set of differential equations that model a simple chaotic dynamic system of convective flow. Lorenz Attractor arises from a simplified set of equations that describe the behavior of a system involving three variables. These variables represent the state of the system at any given time and are typically denoted by (x, y, z). The equations are as follows:

\[ dx/dt = σ*(y-x) \] \[ dy/dt = x*(r -z)-y \] \[ dz/dt = x*y-β*z \] where dx/dt, dy/dt, and dz/dt represent the rates of change of x, y, and z respectively over time (t). σ, r, and β are constant parameters of the system, with σ(σ = 10) controlling the rate of convection, r(r=28) controlling the difference in temperature between the convective and stable regions, and β(β = 8/3) representing the ratio of the width to the height of the convective layer. When these equations are plotted in three-dimensional space, they produce a chaotic trajectory that never repeats. The Lorenz attractor exhibits sensitive dependence on initial conditions, meaning even small differences in the initial conditions can lead to drastically different trajectories over time. This sensitivity to initial conditions is a defining characteristic of chaotic systems.

In this notebook, we will use the Lorenz Attractor Dataset. This dataset contains 200 thousand observations and 5 columns. The dataset can be downloaded from here

The dataset includes the following columns:

3. Importing Code Modules

Here is the guide to install the HVT package. This helps user to install the most recent version of the HVT package.

###direct installation###
#install.packages("HVT")

#or

###git repo installation###
#library(devtools)
#devtools::install_github(repo = "Mu-Sigma/HVT")

NOTE: At the time documenting this vignette, the updated changes were not still in CRAN, hence we are sourcing the scripts from the R folder directly to the session environment.

# Sourcing required code scripts for HVT
script_dir <- "../R"
r_files <- list.files(script_dir, pattern = "\\.R$", full.names = TRUE)
invisible(lapply(r_files, function(file) { source(file, echo = FALSE); }))

4. Data Understanding

Here, we load the data. Let’s explore the Lorenz Attractor Dataset. For the sake of brevity we are displaying only the first ten rows.

dataset <- read.csv("./sample_dataset/lorenze_attractor.csv")
dataset <- dataset %>% dplyr::select(X,Y,Z,U,t)
dataset$t <- round(dataset$t, 5)
Table(dataset, limit = 10)
X Y Z U t
0.0000000 1.0000000 20.00000 0.0000 0.00000
0.0024966 0.9997525 19.98669 0.0005 0.00025
0.0049863 0.9995101 19.97337 0.0010 0.00050
0.0074692 0.9992728 19.96006 0.0015 0.00075
0.0099454 0.9990405 19.94676 0.0020 0.00100
0.0124147 0.9988133 19.93347 0.0025 0.00125
0.0148774 0.9985912 19.92018 0.0030 0.00150
0.0173333 0.9983741 19.90691 0.0035 0.00175
0.0197826 0.9981621 19.89365 0.0040 0.00200
0.0222253 0.9979552 19.88040 0.0045 0.00225

Now let’s try to visualize the Lorenz attractor (overlapping spirals) in 3D Space.

data_3d <- dataset[sample(1:nrow(dataset), 1000), ]
plot_3d <- plotly::plot_ly(data_3d, x= ~X, y= ~Y, z = ~Z) %>% add_markers( marker = list(
                          size = 2,
                          symbol = "circle",
                          color = ~Z,
                          colorscale = "Bluered",
                          colorbar = (list(title = 'Z'))))
plot_3d

Figure 1: Lorenz attractor in 3D space

Now let’s have a look at structure of the Lorenz Attractor dataset.

str(dataset)
#> 'data.frame':    200000 obs. of  5 variables:
#>  $ X: num  0 0.0025 0.00499 0.00747 0.00995 ...
#>  $ Y: num  1 1 1 0.999 0.999 ...
#>  $ Z: num  20 20 20 20 19.9 ...
#>  $ U: num  0 0.0005 0.001 0.0015 0.002 ...
#>  $ t: num  0 0.00025 0.0005 0.00075 0.001 0.00125 0.0015 0.00175 0.002 0.00225 ...

Data distribution

This section displays four objects.

  1. Variable Histograms: The histogram distribution of all the variables in the dataset.

  2. Box Plots: Box plots for each numeric column in the dataset across panels. These plots will display the median and Inter quartile Range of each column at a panel level.

  3. Correlation Matrix: This calculates the pearson correlation which is a bivariate correlation value measuring the linear correlation between two numeric columns. The output plot is shown as a matrix.

  4. Summary EDA: The table provides descriptive statistics for all the variables in the dataset.

It uses an inbuilt function called edaPlots to display the above mentioned four objects.

edaPlots(dataset, time_series = TRUE, time_column = 't')
variable min 1st Quartile median mean sd 3rd Quartile max hist n_row n_missing
X -18.0202 -3.7356 0.8798 0.7083 7.8247 5.8663 16.7554 ▂▃▇▅▃ 2e+05 0
Y -24.2165 -3.4265 0.7270 0.6957 9.0070 5.4724 21.8814 ▁▂▇▃▂ 2e+05 0
Z 5.6491 15.8927 21.6277 23.2424 8.8526 30.6142 44.7478 ▃▇▅▅▂ 2e+05 0
U -10.0000 -3.9458 3.1532 1.8390 6.6585 8.1096 10.0000 ▅▃▃▃▇ 2e+05 0
t 0.0000 12.5000 25.0000 25.0000 14.4339 37.5000 50.0000 ▇▇▇▇▇ 2e+05 0

Train - Test Split

Let us split the dataset into train and test. We will orderly select 80% of the data as train and remaining as test.

noOfPoints <- dim(dataset)[1]
trainLength <- as.integer(noOfPoints * 0.8)
trainDataset <- dataset[1:trainLength,]
testDataset <- dataset[(trainLength+1):noOfPoints,]
rownames(testDataset) <- NULL

4.1 Training dataset

Let’s have a look at the Training dataset containing 160,000 data points. For the sake of brevity we are displaying first 10 rows.

Table(trainDataset, limit = 10)
X Y Z U t
0.0000000 1.0000000 20.00000 0.0000 0.00000
0.0024966 0.9997525 19.98669 0.0005 0.00025
0.0049863 0.9995101 19.97337 0.0010 0.00050
0.0074692 0.9992728 19.96006 0.0015 0.00075
0.0099454 0.9990405 19.94676 0.0020 0.00100
0.0124147 0.9988133 19.93347 0.0025 0.00125
0.0148774 0.9985912 19.92018 0.0030 0.00150
0.0173333 0.9983741 19.90691 0.0035 0.00175
0.0197826 0.9981621 19.89365 0.0040 0.00200
0.0222253 0.9979552 19.88040 0.0045 0.00225

Now lets have a look at structure of the training dataset.

str(trainDataset)
#> 'data.frame':    160000 obs. of  5 variables:
#>  $ X: num  0 0.0025 0.00499 0.00747 0.00995 ...
#>  $ Y: num  1 1 1 0.999 0.999 ...
#>  $ Z: num  20 20 20 20 19.9 ...
#>  $ U: num  0 0.0005 0.001 0.0015 0.002 ...
#>  $ t: num  0 0.00025 0.0005 0.00075 0.001 0.00125 0.0015 0.00175 0.002 0.00225 ...

Data Distribution

edaPlots(trainDataset, time_series = T, time_column = 't')
variable min 1st Quartile median mean sd 3rd Quartile max hist n_row n_missing
X -18.0202 -3.6928 1.0917 0.8511 7.8501 6.1564 16.7554 ▂▃▇▆▃ 160000 0
Y -24.2165 -3.4047 0.9938 0.8913 9.0368 5.9268 21.8814 ▁▂▇▃▂ 160000 0
Z 5.6491 16.1278 21.8036 23.3181 8.7778 30.6148 44.7478 ▃▇▆▅▂ 160000 0
U -10.0000 -5.4029 2.8225 1.4319 6.9893 8.1504 10.0000 ▅▂▃▃▇ 160000 0
t 0.0000 10.0000 20.0000 20.0000 11.5471 30.0000 40.0000 ▇▇▇▇▇ 160000 0

4.2 Testing dataset

Let’s have a look at the Testing dataset containing 40,000 data points. For the sake of brevity we are displaying first 10 rows.

Table(testDataset, limit = 10)
X Y Z U t
16.05834 13.65882 39.59945 9.893524 40.00020
16.05229 13.60880 39.62776 9.893451 40.00045
16.04613 13.55869 39.65584 9.893379 40.00070
16.03985 13.50850 39.68367 9.893306 40.00095
16.03347 13.45823 39.71126 9.893233 40.00120
16.02698 13.40789 39.73861 9.893160 40.00145
16.02037 13.35746 39.76572 9.893087 40.00170
16.01366 13.30696 39.79259 9.893014 40.00195
16.00684 13.25639 39.81921 9.892941 40.00220
15.99991 13.20574 39.84559 9.892868 40.00245

Now lets have a look at structure of the testing dataset.

str(testDataset)
#> 'data.frame':    40000 obs. of  5 variables:
#>  $ X: num  16.1 16.1 16 16 16 ...
#>  $ Y: num  13.7 13.6 13.6 13.5 13.5 ...
#>  $ Z: num  39.6 39.6 39.7 39.7 39.7 ...
#>  $ U: num  9.89 9.89 9.89 9.89 9.89 ...
#>  $ t: num  40 40 40 40 40 ...

Data Distribution

edaPlots(testDataset, time_series = TRUE, time_column = 't')
variable min 1st Quartile median mean sd 3rd Quartile max hist n_row n_missing
X -16.2606 -3.9065 -0.0464 0.1371 7.6957 4.4283 16.0583 ▂▃▇▃▂ 40000 0
Y -20.9897 -3.5599 -0.5983 -0.0863 8.8440 3.5431 19.5597 ▂▂▇▂▂ 40000 0
Z 7.9115 15.0266 20.8133 22.9399 9.1395 30.6121 41.3323 ▆▇▅▃▅ 40000 0
U -5.4402 -0.7516 4.1210 3.4677 4.7921 7.9847 9.8935 ▃▃▃▅▇ 40000 0
t 40.0002 42.5001 45.0001 45.0001 2.8868 47.5001 50.0000 ▇▇▇▇▇ 40000 0

5. Model Training and Visualization

We will use the trainHVT function to compress our dataset while preserving essential features.

Model Parameters

NOTE: The compression takes place only for the X, Y, Z coordinates and not for U(velocity) and t(Timestamp). After training & Scoring, we merge back the U and t column with the dataset.

set.seed(240)
hvt.results <- trainHVT(
  trainDataset[,-c(4:5)],
  n_cells = 100,
  depth = 1,
  quant.err = 0.1,
  normalize = TRUE,
  distance_metric = "L1_Norm",
  error_metric = "max",
  quant_method = "kmeans"
)

Let’s checkout the compression summary .

displayTable(data = hvt.results[[3]]$compression_summary,columnName = 'percentOfCellsBelowQuantizationErrorThreshold', value = 0.8, tableType = "compression")
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 100 0 0 n_cells: 100 quant.err: 0.1 distance_metric: L1_Norm error_metric: max quant_method: kmeans

NOTE: Based on the provided table, it’s evident that the ‘percentOfCellsBelowQuantizationErrorThreshold’ value is zero, indicating that compression hasn’t taken place for the specified number of cells, which is 100. Typically, we would continue increasing this value until at least 80% compression occurs. However, in this vignette demonstration, we’re not doing so because the plots generated from dynamic analysis functions would become cluttered and complex, making explanations less clear.

Now, Let’s plot the Voronoi tessellation for 100 cells.

Figure 2: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’lorenz attractor’

Figure 2: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’lorenz attractor’

6. Scoring

Now once we have built the model, let us try to score using our testing dataset.

set.seed(240)
dataset_score <- testDataset[,-c(4:5)]
scoring_var <- scoreHVT(
  dataset_score,
  hvt.results,
  child.level = 1)

The Flow Map functions mentioned in the next section requires Cell ID from scoring output and sorted Timestamp from the dataset we used for scoring. So we merge them both to get a modified data frame that pairs cell IDs with their respective timestamps.

Let’s see which cell and level each point belongs to with the sorted Timestamp. For the sake of brevity, we will only show the first 100 rows.

scored_data <- scoring_var[["scoredPredictedData"]] %>%round(2) %>% cbind(testDataset) %>% 
               as.data.frame()

colnames(scored_data) <- c("Segment.Level", "Segment.Parent", "Segment.Child", "n","Cell.ID",
                           "Quant.Error", "pred_X", "pred_Y", "pred_Z", "centroidRadius",
                           "diff", "anomalyFlag", "X", "Y", "Z", "U", "t")

displayTable(data =scored_data, columnName= 'Quant.Error', value = 0.1, tableType = "summary", limit =100)
Segment.Level Segment.Parent Segment.Child n Cell.ID Quant.Error pred_X pred_Y pred_Z centroidRadius diff anomalyFlag X Y Z U t
1 1 27 1 4 0.11 1.94 1.41 1.85 0.17 0.07 0 16.06 13.66 39.60 9.89 40.00
1 1 27 1 4 0.1 1.94 1.41 1.86 0.17 0.07 0 16.05 13.61 39.63 9.89 40.00
1 1 27 1 4 0.1 1.94 1.40 1.86 0.17 0.07 0 16.05 13.56 39.66 9.89 40.00
1 1 27 1 4 0.1 1.93 1.40 1.86 0.17 0.08 0 16.04 13.51 39.68 9.89 40.00
1 1 27 1 4 0.09 1.93 1.39 1.87 0.17 0.08 0 16.03 13.46 39.71 9.89 40.00
1 1 27 1 4 0.09 1.93 1.39 1.87 0.17 0.08 0 16.03 13.41 39.74 9.89 40.00
1 1 27 1 4 0.09 1.93 1.38 1.87 0.17 0.09 0 16.02 13.36 39.77 9.89 40.00
1 1 27 1 4 0.08 1.93 1.37 1.88 0.17 0.09 0 16.01 13.31 39.79 9.89 40.00
1 1 27 1 4 0.08 1.93 1.37 1.88 0.17 0.09 0 16.01 13.26 39.82 9.89 40.00
1 1 27 1 4 0.08 1.93 1.36 1.88 0.17 0.10 0 16.00 13.21 39.85 9.89 40.00
1 1 27 1 4 0.07 1.93 1.36 1.89 0.17 0.10 0 15.99 13.16 39.87 9.89 40.00
1 1 27 1 4 0.07 1.93 1.35 1.89 0.17 0.10 0 15.99 13.10 39.90 9.89 40.00
1 1 27 1 4 0.07 1.93 1.35 1.89 0.17 0.10 0 15.98 13.05 39.92 9.89 40.00
1 1 27 1 4 0.07 1.93 1.34 1.89 0.17 0.11 0 15.97 13.00 39.95 9.89 40.00
1 1 27 1 4 0.07 1.93 1.33 1.90 0.17 0.11 0 15.96 12.95 39.97 9.89 40.00
1 1 27 1 4 0.06 1.92 1.33 1.90 0.17 0.11 0 15.96 12.90 40.00 9.89 40.00
1 1 27 1 4 0.06 1.92 1.32 1.90 0.17 0.11 0 15.95 12.85 40.02 9.89 40.00
1 1 27 1 4 0.06 1.92 1.32 1.91 0.17 0.11 0 15.94 12.80 40.05 9.89 40.00
1 1 27 1 4 0.06 1.92 1.31 1.91 0.17 0.11 0 15.93 12.75 40.07 9.89 40.00
1 1 27 1 4 0.06 1.92 1.31 1.91 0.17 0.11 0 15.92 12.70 40.10 9.89 40.00
1 1 27 1 4 0.06 1.92 1.30 1.91 0.17 0.11 0 15.92 12.64 40.12 9.89 40.01
1 1 27 1 4 0.06 1.92 1.29 1.92 0.17 0.12 0 15.91 12.59 40.14 9.89 40.01
1 1 27 1 4 0.06 1.92 1.29 1.92 0.17 0.12 0 15.90 12.54 40.17 9.89 40.01
1 1 27 1 4 0.05 1.92 1.28 1.92 0.17 0.12 0 15.89 12.49 40.19 9.89 40.01
1 1 27 1 4 0.05 1.91 1.28 1.92 0.17 0.12 0 15.88 12.44 40.21 9.89 40.01
1 1 27 1 4 0.05 1.91 1.27 1.93 0.17 0.12 0 15.87 12.39 40.23 9.89 40.01
1 1 27 1 4 0.05 1.91 1.27 1.93 0.17 0.12 0 15.87 12.33 40.26 9.89 40.01
1 1 27 1 4 0.05 1.91 1.26 1.93 0.17 0.12 0 15.86 12.28 40.28 9.89 40.01
1 1 27 1 4 0.05 1.91 1.25 1.93 0.17 0.13 0 15.85 12.23 40.30 9.89 40.01
1 1 27 1 4 0.04 1.91 1.25 1.94 0.17 0.13 0 15.84 12.18 40.32 9.89 40.01
1 1 27 1 4 0.04 1.91 1.24 1.94 0.17 0.13 0 15.83 12.13 40.34 9.89 40.01
1 1 27 1 4 0.04 1.91 1.24 1.94 0.17 0.13 0 15.82 12.08 40.36 9.89 40.01
1 1 27 1 4 0.04 1.91 1.23 1.94 0.17 0.13 0 15.81 12.02 40.38 9.89 40.01
1 1 27 1 4 0.04 1.90 1.23 1.95 0.17 0.13 0 15.80 11.97 40.41 9.89 40.01
1 1 27 1 4 0.04 1.90 1.22 1.95 0.17 0.13 0 15.79 11.92 40.43 9.89 40.01
1 1 27 1 4 0.04 1.90 1.21 1.95 0.17 0.14 0 15.78 11.87 40.45 9.89 40.01
1 1 27 1 4 0.03 1.90 1.21 1.95 0.17 0.14 0 15.77 11.82 40.47 9.89 40.01
1 1 27 1 4 0.03 1.90 1.20 1.96 0.17 0.14 0 15.76 11.76 40.48 9.89 40.01
1 1 27 1 4 0.03 1.90 1.20 1.96 0.17 0.14 0 15.75 11.71 40.50 9.89 40.01
1 1 27 1 4 0.04 1.90 1.19 1.96 0.17 0.14 0 15.74 11.66 40.52 9.89 40.01
1 1 27 1 4 0.04 1.90 1.19 1.96 0.17 0.13 0 15.73 11.61 40.54 9.89 40.01
1 1 27 1 4 0.04 1.89 1.18 1.96 0.17 0.13 0 15.72 11.55 40.56 9.89 40.01
1 1 27 1 4 0.04 1.89 1.17 1.97 0.17 0.13 0 15.71 11.50 40.58 9.89 40.01
1 1 27 1 4 0.05 1.89 1.17 1.97 0.17 0.13 0 15.70 11.45 40.60 9.89 40.01
1 1 27 1 4 0.05 1.89 1.16 1.97 0.17 0.12 0 15.69 11.40 40.61 9.89 40.01
1 1 27 1 4 0.05 1.89 1.16 1.97 0.17 0.12 0 15.68 11.35 40.63 9.89 40.01
1 1 27 1 4 0.05 1.89 1.15 1.97 0.17 0.12 0 15.67 11.29 40.65 9.89 40.01
1 1 27 1 4 0.05 1.89 1.15 1.98 0.17 0.12 0 15.66 11.24 40.67 9.89 40.01
1 1 27 1 4 0.06 1.88 1.14 1.98 0.17 0.12 0 15.65 11.19 40.68 9.89 40.01
1 1 27 1 4 0.06 1.88 1.13 1.98 0.17 0.11 0 15.63 11.14 40.70 9.89 40.01
1 1 27 1 4 0.06 1.88 1.13 1.98 0.17 0.11 0 15.62 11.08 40.72 9.89 40.01
1 1 27 1 4 0.06 1.88 1.12 1.98 0.17 0.11 0 15.61 11.03 40.73 9.89 40.01
1 1 27 1 4 0.06 1.88 1.12 1.99 0.17 0.11 0 15.60 10.98 40.75 9.89 40.01
1 1 27 1 4 0.07 1.88 1.11 1.99 0.17 0.11 0 15.59 10.93 40.76 9.89 40.01
1 1 27 1 4 0.07 1.88 1.10 1.99 0.17 0.10 0 15.58 10.87 40.78 9.89 40.01
1 1 27 1 4 0.07 1.87 1.10 1.99 0.17 0.10 0 15.57 10.82 40.79 9.89 40.01
1 1 27 1 4 0.07 1.87 1.09 1.99 0.17 0.10 0 15.55 10.77 40.81 9.89 40.01
1 1 27 1 4 0.07 1.87 1.09 1.99 0.17 0.10 0 15.54 10.72 40.82 9.89 40.01
1 1 27 1 4 0.08 1.87 1.08 2.00 0.17 0.10 0 15.53 10.66 40.84 9.89 40.01
1 1 27 1 4 0.08 1.87 1.08 2.00 0.17 0.09 0 15.52 10.61 40.85 9.89 40.01
1 1 27 1 4 0.08 1.87 1.07 2.00 0.17 0.09 0 15.50 10.56 40.86 9.89 40.02
1 1 27 1 4 0.08 1.87 1.06 2.00 0.17 0.09 0 15.49 10.51 40.88 9.89 40.02
1 1 27 1 4 0.09 1.86 1.06 2.00 0.17 0.09 0 15.48 10.45 40.89 9.89 40.02
1 1 27 1 4 0.09 1.86 1.05 2.00 0.17 0.08 0 15.47 10.40 40.90 9.89 40.02
1 1 27 1 4 0.09 1.86 1.05 2.00 0.17 0.08 0 15.45 10.35 40.92 9.89 40.02
1 1 27 1 4 0.1 1.86 1.04 2.01 0.17 0.08 0 15.44 10.30 40.93 9.89 40.02
1 1 27 1 4 0.1 1.86 1.03 2.01 0.17 0.07 0 15.43 10.24 40.94 9.89 40.02
1 1 27 1 4 0.1 1.86 1.03 2.01 0.17 0.07 0 15.42 10.19 40.95 9.89 40.02
1 1 27 1 4 0.1 1.85 1.02 2.01 0.17 0.07 0 15.40 10.14 40.97 9.89 40.02
1 1 27 1 4 0.11 1.85 1.02 2.01 0.17 0.06 0 15.39 10.09 40.98 9.89 40.02
1 1 27 1 4 0.11 1.85 1.01 2.01 0.17 0.06 0 15.38 10.03 40.99 9.89 40.02
1 1 27 1 4 0.11 1.85 1.01 2.01 0.17 0.06 0 15.36 9.98 41.00 9.89 40.02
1 1 27 1 4 0.12 1.85 1.00 2.02 0.17 0.06 0 15.35 9.93 41.01 9.89 40.02
1 1 27 1 4 0.12 1.85 0.99 2.02 0.17 0.05 0 15.34 9.88 41.02 9.89 40.02
1 1 27 1 4 0.12 1.84 0.99 2.02 0.17 0.05 0 15.32 9.82 41.03 9.89 40.02
1 1 27 1 4 0.12 1.84 0.98 2.02 0.17 0.05 0 15.31 9.77 41.04 9.89 40.02
1 1 27 1 4 0.13 1.84 0.98 2.02 0.17 0.04 0 15.29 9.72 41.05 9.89 40.02
1 1 27 1 4 0.13 1.84 0.97 2.02 0.17 0.04 0 15.28 9.67 41.06 9.89 40.02
1 1 66 1 8 0.13 1.84 0.97 2.02 0.19 0.06 0 15.27 9.62 41.07 9.89 40.02
1 1 66 1 8 0.13 1.83 0.96 2.02 0.19 0.06 0 15.25 9.56 41.08 9.89 40.02
1 1 66 1 8 0.13 1.83 0.95 2.02 0.19 0.06 0 15.24 9.51 41.09 9.89 40.02
1 1 66 1 8 0.13 1.83 0.95 2.03 0.19 0.06 0 15.22 9.46 41.10 9.89 40.02
1 1 66 1 8 0.12 1.83 0.94 2.03 0.19 0.07 0 15.21 9.41 41.11 9.89 40.02
1 1 66 1 8 0.12 1.83 0.94 2.03 0.19 0.07 0 15.19 9.35 41.12 9.89 40.02
1 1 66 1 8 0.12 1.83 0.93 2.03 0.19 0.07 0 15.18 9.30 41.12 9.89 40.02
1 1 66 1 8 0.12 1.82 0.92 2.03 0.19 0.07 0 15.16 9.25 41.13 9.89 40.02
1 1 66 1 8 0.11 1.82 0.92 2.03 0.19 0.08 0 15.15 9.20 41.14 9.89 40.02
1 1 66 1 8 0.11 1.82 0.91 2.03 0.19 0.08 0 15.14 9.15 41.15 9.89 40.02
1 1 66 1 8 0.11 1.82 0.91 2.03 0.19 0.08 0 15.12 9.09 41.15 9.89 40.02
1 1 66 1 8 0.11 1.82 0.90 2.03 0.19 0.08 0 15.10 9.04 41.16 9.89 40.02
1 1 66 1 8 0.11 1.81 0.90 2.03 0.19 0.08 0 15.09 8.99 41.17 9.89 40.02
1 1 66 1 8 0.1 1.81 0.89 2.03 0.19 0.09 0 15.07 8.94 41.17 9.89 40.02
1 1 66 1 8 0.1 1.81 0.88 2.03 0.19 0.09 0 15.06 8.89 41.18 9.89 40.02
1 1 66 1 8 0.1 1.81 0.88 2.04 0.19 0.09 0 15.04 8.83 41.18 9.89 40.02
1 1 66 1 8 0.1 1.81 0.87 2.04 0.19 0.09 0 15.03 8.78 41.19 9.89 40.02
1 1 66 1 8 0.09 1.80 0.87 2.04 0.19 0.10 0 15.01 8.73 41.20 9.89 40.02
1 1 66 1 8 0.09 1.80 0.86 2.04 0.19 0.10 0 15.00 8.68 41.20 9.89 40.02
1 1 66 1 8 0.09 1.80 0.86 2.04 0.19 0.10 0 14.98 8.63 41.21 9.89 40.02
1 1 66 1 8 0.09 1.80 0.85 2.04 0.19 0.10 0 14.96 8.58 41.21 9.89 40.02
1 1 66 1 8 0.08 1.80 0.84 2.04 0.19 0.11 0 14.95 8.52 41.21 9.89 40.02

7. Timeseries plot with State Transitions

Let’s comprehend the function plotStateTransition which is used to create a time series plotly object.

plotStateTransition(
       df,
       sample_size,
       line_plot,
       cellid_column,
       time_column 
)
plotStateTransition(df = scored_data, cellid_column = "Cell.ID", time_column = "t", sample_size = 1)


8. Transition probability tables

getTransitionProbability(
        df, 
        cellid_column, 
        time_column)

This function displays probability for Tplus1 states for all cells in the form of table. For the sake of brevity we are displaying the probability table for the Cell ID 1 to 5.

trans_table <- getTransitionProbability(df = scored_data, cellid_column = "Cell.ID", time_column = "t")
Table(trans_table[[1]])
Current_State Next_State Relative_Frequency Probability_Percentage
1 1 297 0.9867
1 4 3 0.0100
1 6 1 0.0033
Table(trans_table[[2]])
Current_State Next_State Relative_Frequency Probability_Percentage
2 1 4 0.0089
2 2 445 0.9867
2 6 2 0.0044
Table(trans_table[[3]])
Current_State Next_State Relative_Frequency Probability_Percentage
3 2 5 0.0105
3 3 470 0.9874
3 10 1 0.0021
Table(trans_table[[4]])
Current_State Next_State Relative_Frequency Probability_Percentage
4 4 463 0.9872
4 8 5 0.0107
4 12 1 0.0021
Table(trans_table[[5]])
Current_State Next_State Relative_Frequency Probability_Percentage
5 3 4 0.0158
5 5 248 0.9802
5 9 1 0.0040